|
Several binary representations of character sets for common Western European languages are compared in this article. These encodings were designed for representation of Italian, Spanish, Portuguese, French, German, Dutch, English, Danish, Swedish, Norwegian, and Icelandic, which use the Latin alphabet, a few additional letters and ones with precomposed diacritics, some punctuation, and various symbols (including some Greek letters). Although they're called "Western European" many of these languages are spoken all over the world. Also, these character sets happen to support many other languages such as Malay, Swahili, and Classical Latin. == Summary == The ISO-8859 series of 8-bit character sets encodes all Latin character sets used in Europe, albeit that the same code points have multiple uses that caused some difficulty. The arrival of Unicode, with a unique code point for every glyph, resolved these issues. * ISO/IEC 8859-1 or Latin-1 is the most used and also defines the first 256 codes in Unicode * ISO/IEC 8859-15 modifies ISO-8859-1 to support Finnish and French and add the euro sign. * In terms of printable characters Windows-1252 has everything ISO-8859-1 and ISO-8859-15 have and more. * IBM CP437, being intended for English only, has very little in the way of accented letters but has far more graphics characters than the others and also some Greek characters that are useful as technical symbols. * IBM CP850 has all the printable characters that ISO-8859-1 has (albeit arranged differently) and still manages to have enough graphics characters to build a usable text-mode user interface. * IBM CP858 differs from CP850 only by one character — a ''dotless i'' (ı), rarely used outside Turkey, was replaced by ''euro currency sign'' (€). * IBM CP859 contains all the printable characters that ISO-8859-15 has, so unlike CP850 it supports the € and French. * IBM code pages 037, 500, and 1047 are EBCDIC encodings that include all of the ISO-8859-1 characters. * The Mac OS Roman character set (often referred to as MacRoman and known by the IANA as simply MACINTOSH) has most, but not all, of the same characters as ISO-8859-1 but in a very different arrangement; and it also adds many technical and mathematical characters and more diacritics. Older Macintosh web browsers were known to munge the few characters that were in ISO-8859-1 but not their native Macintosh character set when editing text from Web sites. Conversely, in Web material prepared on an older Macintosh, many characters were displayed incorrectly when read by other operating systems. * The euro sign post-dates these (ISO-8859) specifications: conflicting ways to retrofit it led to significant difficulty until Unicode became more generally adopted. 抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「Western Latin character sets (computing)」の詳細全文を読む スポンサード リンク
|